deep convolutional network
1b742ae215adf18b75449c6e272fd92d-AuthorFeedback.pdf
We thank all the reviewers for their time and effort in providing feedback. For clarity, we would like to reiterate the goal and motivation of the paper. Ingeneral, wedonothaveaccess14 to the target network, but only to the labeled training data. AsoptimizingReLU16 neural network is itself NP-Hard in general, we expect all algorithms to be inefficient in the worst case. Thus, the approximated network achieved 97.17% test set accuracy with 44.69% sparsity.27
Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks
In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only $\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.
Supplementary Material: Aligned Structured Sparsity Learning for Efficient Image Super-Resolution
Our proposed aligned structured sparsity learning (ASSL) algorithm is summarized in Algorithm 1. There are in total 16 residual blocks in EDSR_baseline. We provide more visual comparisons in Figure 1. In contrast, our ASSLN can better recover more structural details. While, our ASSLN can better alleviate the blurring artifacts.
CrossStateECG: Multi-Scale Deep Convolutional Network with Attention for Rest-Exercise ECG Biometrics
Zheng, Dan, Feng, Jing, Liu, Juan
Current research in Electrocardiogram (ECG) biometrics mainly emphasizes resting - state conditions, leaving the performance decline in rest - exercise scenarios largely unresolved. This paper introduces CrossStateECG, a robust ECG - based authentication model e xplicitly tailored for cross - state (rest - exercise) conditions. The proposed model creatively combines multi - scale d eep c onvolu-tional feature extraction with attention mechanisms to ensure strong identification across different physiological states. Experim ental results on the exercise - ECGID dataset validate the effectiveness of CrossStateECG, achieving an identification accuracy of 92.50% in the Rest - to - Exercise scenario (training on resting ECG and testing on post - exercis e ECG) and 94.72% in the Exercise - t o - Rest scenario (training on post - exercis e ECG and testing on rest ing ECG). Furthermore, CrossStateECG demonstrates exceptional performance across both state combinations, reaching an accuracy of 99.94% in Rest - to - Rest scenarios and 97.85% in Mixed - to - Mixed scenarios. Additional validations on the ECG - ID and MIT - BIH datasets further confirmed the generalization abilities of CrossStateECG, underscoring it s potential as a practical solution for post - exercise ECG - based authentication in dynamic real - world settings.
Sparks of Explainability: Recent Advancements in Explaining Large Vision Models
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.
Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks
In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of \mathcal{O}(\log d) suffices for deep CNNs to achieve this universality, where d in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only \widetilde{\mathcal{O}}(\log 2d) samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes
Traditional computer graphics rendering pipelines are designed for procedurally generating 2D images from 3D shapes with high performance. The nondifferentiability due to discrete operations (such as visibility computation) makes it hard to explicitly correlate rendering parameters and the resulting image, posing a significant challenge for inverse rendering tasks. Recent work on differentiable rendering achieves differentiability either by designing surrogate gradients for non-differentiable operations or via an approximate but differentiable renderer. These methods, however, are still limited when it comes to handling occlusion, and restricted to particular rendering effects. We present RenderNet, a differentiable rendering convolutional network with a novel projection unit that can render 2D images from 3D shapes.
Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network
Li, Lei, Chen, Zhifa, Wang, Jian, Zhou, Bin, Yu, Guizhen, Chen, Xiaoxuan
Recently, the application of autonomous driving in open-pit mining has garnered increasing attention for achieving safe and efficient mineral transportation. Compared to urban structured roads, unstructured roads in mining sites have uneven boundaries and lack clearly defined lane markings. This leads to a lack of sufficient constraint information for predicting the trajectories of other human-driven vehicles, resulting in higher uncertainty in trajectory prediction problems. A method is proposed to predict multiple possible trajectories and their probabilities of the target vehicle. The surrounding environment and historical trajectories of the target vehicle are encoded as a rasterized image, which is used as input to our deep convolutional network to predict the target vehicle's multiple possible trajectories. The method underwent offline testing on a dataset specifically designed for autonomous driving scenarios in open-pit mining and was compared and evaluated against physics-based method.